Search CORE

45 research outputs found

Combining Multi-Scale Character Recognition and Linguistic Knowledge for Natural Scene Text OCR

Author: Elagouni Khaoula
Garcia Christophe
Mamalet Franck
Sébillot Pascale
Publication venue: HAL CCSD
Publication date: 27/03/2012
Field of study

International audienceUnderstanding text captured in real-world scenes is a challenging problem in the field of visual pattern recognition and continues to generate a significant interest in the OCR (Optical Character Recognition) community. This paper proposes a novel method to recognize scene texts avoiding the conventional character segmentation step. The idea is to scan the text image with multi-scale windows and apply a robust recognition model, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones. Recognition results are represented as a graph model in order to determine the best sequence of characters. Some linguistic knowledge is also incorporated to remove errors due to recognition confusions. The designed method is evaluated on the ICDAR 2003 database of scene text images and outperforms state-of-the-art approaches

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

Author: A Dempster
C Garcia
Christophe Garcia
D Chen
Franck Mamalet
H Li
J Lim
J Weinman
K Jung
Khaoula Elagouni
L Bahl
M Li
Pascale Sébillot
Q Ye
R Casey
R Yager
S Lucas
T Sato
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2014
Field of study

International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

HAL-Rennes 1

Adéquation algorithme-architecture pour les réseaux de neurones à convolution: application à l'analyse de visages embarquée

Author: Mamalet Franck
Publication venue: HAL CCSD
Publication date: 06/07/2011
Field of study

Proliferation of image sensors in many electronic devices, and increasing processing capabilities of such sensors, open a field of exploration for the implementation and optimization of complex image processing algorithms in order to provide embedded vision systems. This work is a contribution in the research domain of algorithm-architecture matching. It focuses on a class of algorithms called convolution neural network (ConvNet) and its applications in embedded facial analysis. The facial analysis framework, introduced by Garcia et al., was chosen for its state of the art performances in detection/recognition, and also for its homogeneity based on ConvNets. The first contribution of this work deals with an adequacy study of this facial analysis framework with embedded processors. We propose several algorithmic adaptations of ConvNets, and show that they can lead to significant speedup factors (up to 700) on an embedded processor for mobile phone, without performance degradation. We then present a study of ConvNets parallelization capabilities, through N. Farrugia's PhD work. A coarse-grain parallelism exploration of ConvNets, followed by study of internal scheduling of elementary processors, lead to a parameterized parallel architecture on FPGA, able to detect faces at more than 10 VGA frames per second. Finally, we propose an extension of these studies to the learning phase of neural networks. We analyze several hypothesis space restrictions for ConvNets, and show, on a case study, that classification rate performances are almost the same with a training time divided by up to five.La prolifération des capteurs d'images dans de nombreux appareils électroniques, et l'évolution des capacités de traitements à proximité de ces capteurs ouvrent un champ d'exploration pour l'implantation et l'optimisation d'algorithmes complexes de traitement d'images afin de proposer des systèmes de vision artificielle embarquée. Ces travaux s'inscrivent dans la problématique dite d'adéquation algorithme-architecture (A3). Ils portent sur une classe d'algorithmes appelée réseau de neurones à convolutions (ConvNet) et ses applications en analyse de visages embarquée. La chaîne d'analyse de visages, introduite par Garcia et al., a été choisie d'une part pour ses performances en taux de détection/reconnaissance au niveau de l'état de l'art, et d'autre part pour son caractère homogène reposant sur des ConvNets. La première contribution de ces travaux porte sur une étude d'adéquation de cette chaîne d'analyse de visages aux processeurs embarqués. Nous proposons plusieurs adaptations algorithmiques des ConvNets, et montrons que celles-ci permettent d'obtenir des facteurs d'accélération importants (jusqu'à 700) sur un processeur embarqué pour mobile, sans dégradation des performances en taux de détection/reconnaissance. Nous présentons ensuite une étude des capacités de parallélisation des ConvNets, au travers des travaux de thèse de N. Farrugia. Une exploration "gros-grain" du parallélisme des ConvNets, suivie d'une étude de l'ordonnancement interne des processeurs élémentaires, conduisent à une architecture parallèle paramétrable, capable de détecter des visages à plus de 10 images VGA par seconde sur FPGA. Nous proposons enfin une extension de ces études à la phase d'apprentissage de ces réseaux de neurones. Nous étudions des restrictions de l'espace des hypothèses d'apprentissage, et montrons, sur un cas d'application, que les capacités d'apprentissage des ConvNets ne sont pas dégradées, et que le temps d'apprentissage peut être réduit jusqu'à un facteur cinq

Hal-Diderot

Algorithm-architecture matching for convolutional neural network : application to embedded facial analysis

Author: Mamalet Franck
Publication venue
Publication date: 06/07/2011
Field of study

La prolifération des capteurs d'images dans de nombreux appareils électroniques, et l'évolution des capacités de traitements à proximité de ces capteurs ouvrent un champ d'exploration pour l'implantation et l'optimisation d'algorithmes complexes de traitement d'images afin de proposer des systèmes de vision artificielle embarquée. Ces travaux s'inscrivent dans la problématique dite d'adéquation algorithme-architecture (A3). Ils portent sur une classe d'algorithmes appelée réseau de neurones à convolutions (ConvNet) et ses applications en analyse de visages embarquée. La chaîne d'analyse de visages, introduite par Garcia et al., a été choisie d'une part pour ses performances en taux de détection/reconnaissance au niveau de l'état de l'art, et d'autre part pour son caractère homogène reposant sur des ConvNets. La première contribution de ces travaux porte sur une étude d'adéquation de cette chaîne d'analyse de visages aux processeurs embarqués. Nous proposons plusieurs adaptations algorithmiques des ConvNets, et montrons que celles-ci permettent d'obtenir des facteurs d'accélération importants (jusqu'à 700) sur un processeur embarqué pour mobile, sans dégradation des performances en taux de détection/reconnaissance. Nous présentons ensuite une étude des capacités de parallélisation des ConvNets, au travers des travaux de thèse de N. Farrugia. Une exploration "gros-grain" du parallélisme des ConvNets, suivie d'une étude de l'ordonnancement interne des processeurs élémentaires, conduisent à une architecture parallèle paramétrable, capable de détecter des visages à plus de 10 images VGA par seconde sur FPGA. Nous proposons enfin une extension de ces études à la phase d'apprentissage de ces réseaux de neurones. Nous étudions des restrictions de l'espace des hypothèses d'apprentissage, et montrons, sur un cas d'application, que les capacités d'apprentissage des ConvNets ne sont pas dégradées, et que le temps d'apprentissage peut être réduit jusqu'à un facteur cinq.Proliferation of image sensors in many electronic devices, and increasing processing capabilities of such sensors, open a field of exploration for the implementation and optimization of complex image processing algorithms in order to provide embedded vision systems. This work is a contribution in the research domain of algorithm-architecture matching. It focuses on a class of algorithms called convolution neural network (ConvNet) and its applications in embedded facial analysis. The facial analysis framework, introduced by Garcia et al., was chosen for its state of the art performances in detection/recognition, and also for its homogeneity based on ConvNets. The first contribution of this work deals with an adequacy study of this facial analysis framework with embedded processors. We propose several algorithmic adaptations of ConvNets, and show that they can lead to significant speedup factors (up to 700) on an embedded processor for mobile phone, without performance degradation. We then present a study of ConvNets parallelization capabilities, through N. Farrugia's PhD work. A coarse-grain parallelism exploration of ConvNets, followed by study of internal scheduling of elementary processors, lead to a parameterized parallel architecture on FPGA, able to detect faces at more than 10 VGA frames per second. Finally, we propose an extension of these studies to the learning phase of neural networks. We analyze several hypothesis space restrictions for ConvNets, and show, on a case study, that classification rate performances are almost the same with a training time divided by up to five

Theses.fr

Adéquation algorithme-architecture pour les réseaux de neurones à convolution (application à l'analyse de visages embarquée)

Author: GARCIA Christophe
MAMALET Franck
Publication venue
Publication date: 01/01/2011
Field of study

OpenGrey Repository

Simplifying ConvNets for Fast Learning

Author: Garcia Christophe
Mamalet Franck
Publication venue: Springer
Publication date: 11/09/2012
Field of study

International audienceIn this paper, we propose diﬀerent strategies for simplifying ﬁlters, used as feature extractors, to be learnt in convolutional neural networks (ConvNets) in order to modify the hypothesis space, and to speed-up learning and processing times. We study two kinds of ﬁlters that are known to be computationally eﬃcient in feed-forward processing: fused convolution/sub-sampling ﬁlters, and separable ﬁlters. We compare the complexity of the back-propagation algorithm on ConvNets based on these diﬀerent kinds of ﬁlters. We show that using these ﬁlters allows to reach the same level of recognition performance as with classical ConvNets for handwritten digit recognition, up to 3.3 times faster

HAL

Real-Time Video Convolutional Face Finder on Embedded Platforms

Author: Garcia Christophe
Mamalet Franck
Roux S&#233
Publication venue: Springer
Publication date: 01/01/2007
Field of study

<p/> <p>A high-level optimization methodology is applied for implementing the well-known convolutional face finder (CFF) algorithm for real-time applications on mobile phones, such as teleconferencing, advanced user interfaces, image indexing, and security access control. CFF is based on a feature extraction and classification technique which consists of a pipeline of convolutions and subsampling operations. The design of embedded systems requires a good trade-off between performance and code size due to the limited amount of available resources. The followed methodology copes with the main drawbacks of the original implementation of CFF such as floating-point computation and memory allocation, in order to allow parallelism exploitation and perform algorithm optimizations. Experimental results show that our embedded face detection system can accurately locate faces with less computational load and memory cost. It runs on a 275 MHz Starcore DSP at 35 QCIF images/s with state-of-the-art detection rates and very low false alarm rates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks

Author: Malgouyres François
Mamalet Franck
Mehdi Achour El
Publication venue: Microtome Publishing
Publication date: 01/01/2022
Field of study

International audienceImposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary conditions and 'same' boundary conditions with zero-padding.Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments and the landscape of the regularization term is studied. Experiments on real data sets show that when orthogonality is used to enforce robustness, the parameter multiplying the regularization termcan be used to tune a tradeoff between accuracy and orthogonality, for the benefit of both accuracy and robustness.Altogether, the study guarantees that the regularization proposed in Wang et al. (2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers

Scientific Publications of the University of Toulouse II Le Mirail

A Comparison between Multi-Layer Perceptrons and Convolutional Neural Networks for Text Image Super-Resolution

Author: Garcia Christophe
Mamalet Franck
Peyrard Clément
Publication venue: HAL CCSD
Publication date: 11/03/2015
Field of study

International audienceWe compare the performances of several Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (ConvNets) for single text image Super-Resolution. We propose an example-based framework for both MLP and ConvNet, where a non-linear mapping between pairs of patches and high-frequency pixel values is learned. We then demonstrate that for equivalent complexity, ConvNets are better than MLPs at predicting missing details in upsampled text images. To evaluate the performances, we make use of a recent database (ULR-textSISR-2013a) along with different quality measures. We show that the proposed methods outperforms sparse coding-based methods for this database

HAL

Hal-Diderot

Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks

Author: Malgouyres François
Mamalet Franck
Mehdi Achour El
Publication venue: Microtome Publishing
Publication date: 01/01/2022
Field of study

HAL-INSA Toulouse